Overview

Dataset statistics

Number of variables20
Number of observations19812
Missing cells17184
Missing cells (%)4.3%
Duplicate rows557
Duplicate rows (%)2.8%
Total size in memory2.2 MiB
Average record size in memory118.0 B

Variable types

NUM9
BOOL7
CAT4

Warnings

nr_facades has constant value "19812" Constant
Dataset has 557 (2.8%) duplicate rows Duplicates
basement is highly correlated with landHigh correlation
land is highly correlated with basementHigh correlation
type_subproperty is highly correlated with type_propertyHigh correlation
type_property is highly correlated with type_subpropertyHigh correlation
type_of_sale has 12992 (65.6%) missing values Missing
building has 4192 (21.2%) missing values Missing
netHabitableSurface is highly skewed (γ1 = 51.83050046) Skewed
nr_bedrooms is highly skewed (γ1 = 25.94489667) Skewed
garden_m2 is highly skewed (γ1 = 29.79909112) Skewed
land is highly skewed (γ1 = 20.56953032) Skewed
basement is highly skewed (γ1 = 21.31584435) Skewed
netHabitableSurface has 2728 (13.8%) zeros Zeros
nr_bedrooms has 2492 (12.6%) zeros Zeros
garden_m2 has 16425 (82.9%) zeros Zeros
terrace_m2 has 12429 (62.7%) zeros Zeros
land has 10685 (53.9%) zeros Zeros
basement has 1439 (7.3%) zeros Zeros

Reproduction

Analysis started2020-11-19 10:52:47.189232
Analysis finished2020-11-19 10:53:09.380902
Duration22.19 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

Unnamed: 0
Real number (ℝ≥0)

Distinct19253
Distinct (%)97.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8727770.808
Minimum2884976
Maximum8957547
Zeros0
Zeros (%)0.0%
Memory size154.8 KiB
2020-11-19T11:53:09.558833image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum2884976
5-th percentile8082832.1
Q18693416.25
median8839908.5
Q38911999.25
95-th percentile8948774.05
Maximum8957547
Range6072571
Interquartile range (IQR)218583

Descriptive statistics

Standard deviation335383.9802
Coefficient of variation (CV)0.03842722129
Kurtosis33.79564081
Mean8727770.808
Median Absolute Deviation (MAD)86831.5
Skewness-4.298438317
Sum1.729145952e+11
Variance1.124824142e+11
MonotocityNot monotonic
2020-11-19T11:53:09.821990image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
88350292< 0.1%
 
88930362< 0.1%
 
89440012< 0.1%
 
88278262< 0.1%
 
89226482< 0.1%
 
82547952< 0.1%
 
89489592< 0.1%
 
84594842< 0.1%
 
88812452< 0.1%
 
88843652< 0.1%
 
Other values (19243)1979299.9%
 
ValueCountFrequency (%) 
28849761< 0.1%
 
38167541< 0.1%
 
38559731< 0.1%
 
38806501< 0.1%
 
39937801< 0.1%
 
ValueCountFrequency (%) 
89575471< 0.1%
 
89575371< 0.1%
 
89575351< 0.1%
 
89575161< 0.1%
 
89575151< 0.1%
 

type_property
Categorical

HIGH CORRELATION

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size154.8 KiB
HOUSE
9309 
APARTMENT
8582 
COMMERCIAL
1113 
INDUSTRY
 
285
OFFICE
 
275
Other values (3)
 
248
ValueCountFrequency (%) 
HOUSE930947.0%
 
APARTMENT858243.3%
 
COMMERCIAL11135.6%
 
INDUSTRY2851.4%
 
OFFICE2751.4%
 
OTHER2021.0%
 
GARAGE430.2%
 
LAND3< 0.1%
 
2020-11-19T11:53:10.089920image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-11-19T11:53:10.263355image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:10.466858image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length10
Median length8
Mean length7.072632748
Min length4

type_subproperty
Categorical

HIGH CORRELATION

Distinct49
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size154.8 KiB
APARTMENT
6472 
HOUSE
4602 
APARTMENT_BLOCK
1583 
MIXED_USE_BUILDING
1388 
VILLA
990 
Other values (44)
4777 
ValueCountFrequency (%) 
APARTMENT647232.7%
 
HOUSE460223.2%
 
APARTMENT_BLOCK15838.0%
 
MIXED_USE_BUILDING13887.0%
 
VILLA9905.0%
 
MIXED_USE_BUILDING_COMMERCIAL7233.6%
 
DUPLEX5802.9%
 
PENTHOUSE5262.7%
 
GROUND_FLOOR4192.1%
 
FLAT_STUDIO3021.5%
 
Other values (39)222711.2%
 
2020-11-19T11:53:10.735411image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique2 ?
Unique (%)< 0.1%
2020-11-19T11:53:11.015963image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length29
Median length9
Mean length10.27422774
Min length3

price
Real number (ℝ≥0)

Distinct2062
Distinct (%)10.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean504305.0275
Minimum0
Maximum15000000
Zeros30
Zeros (%)0.2%
Memory size154.8 KiB
2020-11-19T11:53:11.254928image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile121550
Q1220000
median325000
Q3535000
95-th percentile1495000
Maximum15000000
Range15000000
Interquartile range (IQR)315000

Descriptive statistics

Standard deviation624065.9008
Coefficient of variation (CV)1.237477056
Kurtosis58.74828977
Mean504305.0275
Median Absolute Deviation (MAD)130000
Skewness5.736459063
Sum9991291204
Variance3.894582485e+11
MonotocityNot monotonic
2020-11-19T11:53:11.512367image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
2950002341.2%
 
2990002241.1%
 
2750002221.1%
 
1990002201.1%
 
3950002121.1%
 
2250002021.0%
 
2490002001.0%
 
2500001810.9%
 
4950001720.9%
 
3250001710.9%
 
Other values (2052)1777489.7%
 
ValueCountFrequency (%) 
0300.2%
 
7001< 0.1%
 
25004< 0.1%
 
30001< 0.1%
 
60001< 0.1%
 
ValueCountFrequency (%) 
150000001< 0.1%
 
127000001< 0.1%
 
115000001< 0.1%
 
100000001< 0.1%
 
95000001< 0.1%
 

locality
Real number (ℝ≥0)

Distinct941
Distinct (%)4.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5142.060267
Minimum1000
Maximum9992
Zeros0
Zeros (%)0.0%
Memory size154.8 KiB
2020-11-19T11:53:11.802655image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1000
5-th percentile1050
Q11830
median4870
Q38380
95-th percentile9470
Maximum9992
Range8992
Interquartile range (IQR)6550

Descriptive statistics

Standard deviation3150.37723
Coefficient of variation (CV)0.6126682821
Kurtosis-1.600802044
Mean5142.060267
Median Absolute Deviation (MAD)3430
Skewness0.02093949828
Sum101874498
Variance9924876.689
MonotocityNot monotonic
2020-11-19T11:53:12.057760image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
83007713.9%
 
11805592.8%
 
90004822.4%
 
10004642.3%
 
10504122.1%
 
84002811.4%
 
40002571.3%
 
10702291.2%
 
80002151.1%
 
20002121.1%
 
Other values (931)1593080.4%
 
ValueCountFrequency (%) 
10004642.3%
 
1020750.4%
 
10301901.0%
 
10401140.6%
 
10504122.1%
 
ValueCountFrequency (%) 
99923< 0.1%
 
9991110.1%
 
9990290.1%
 
99883< 0.1%
 
99811< 0.1%
 

netHabitableSurface
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct862
Distinct (%)4.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean197.5509287
Minimum0
Maximum50000
Zeros2728
Zeros (%)13.8%
Memory size154.8 KiB
2020-11-19T11:53:12.320374image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q180
median130
Q3220
95-th percentile521
Maximum50000
Range50000
Interquartile range (IQR)140

Descriptive statistics

Standard deviation521.8963402
Coefficient of variation (CV)2.641831874
Kurtosis4396.795598
Mean197.5509287
Median Absolute Deviation (MAD)65
Skewness51.83050046
Sum3913879
Variance272375.7899
MonotocityNot monotonic
2020-11-19T11:53:12.603885image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0272813.8%
 
1203071.5%
 
1002931.5%
 
1502831.4%
 
902741.4%
 
2002671.3%
 
1602431.2%
 
802301.2%
 
1102271.1%
 
1802221.1%
 
Other values (852)1473874.4%
 
ValueCountFrequency (%) 
0272813.8%
 
52< 0.1%
 
154< 0.1%
 
164< 0.1%
 
175< 0.1%
 
ValueCountFrequency (%) 
500001< 0.1%
 
210001< 0.1%
 
155111< 0.1%
 
108371< 0.1%
 
100002< 0.1%
 

nr_bedrooms
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct49
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.894306481
Minimum0
Maximum204
Zeros2492
Zeros (%)12.6%
Memory size154.8 KiB
2020-11-19T11:53:12.887299image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q12
median3
Q34
95-th percentile6
Maximum204
Range204
Interquartile range (IQR)2

Descriptive statistics

Standard deviation3.912543218
Coefficient of variation (CV)1.351806813
Kurtosis1151.16406
Mean2.894306481
Median Absolute Deviation (MAD)1
Skewness25.94489667
Sum57342
Variance15.30799444
MonotocityNot monotonic
2020-11-19T11:53:13.155023image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=49)
ValueCountFrequency (%) 
2531426.8%
 
3509425.7%
 
0249212.6%
 
4243512.3%
 
117889.0%
 
512126.1%
 
66373.2%
 
72601.3%
 
81690.9%
 
91000.5%
 
Other values (39)3111.6%
 
ValueCountFrequency (%) 
0249212.6%
 
117889.0%
 
2531426.8%
 
3509425.7%
 
4243512.3%
 
ValueCountFrequency (%) 
2043< 0.1%
 
1001< 0.1%
 
991< 0.1%
 
902< 0.1%
 
801< 0.1%
 
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size19.3 KiB
True
14379 
False
5433 
ValueCountFrequency (%) 
True1437972.6%
 
False543327.4%
 
2020-11-19T11:53:13.580169image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

nr_facades
Boolean

CONSTANT
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size154.8 KiB
0
19812 
ValueCountFrequency (%) 
019812100.0%
 
2020-11-19T11:53:13.766669image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

hasGarden
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size19.3 KiB
False
16425 
True
3387 
ValueCountFrequency (%) 
False1642582.9%
 
True338717.1%
 
2020-11-19T11:53:13.833762image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

garden_m2
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct769
Distinct (%)3.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean164.5301333
Minimum0
Maximum94000
Zeros16425
Zeros (%)82.9%
Memory size154.8 KiB
2020-11-19T11:53:13.978010image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile510
Maximum94000
Range94000
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1681.836659
Coefficient of variation (CV)10.22205857
Kurtosis1149.066964
Mean164.5301333
Median Absolute Deviation (MAD)0
Skewness29.79909112
Sum3259671
Variance2828574.547
MonotocityNot monotonic
2020-11-19T11:53:14.210306image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
01642582.9%
 
1001100.6%
 
50830.4%
 
200790.4%
 
300690.3%
 
150620.3%
 
500590.3%
 
60590.3%
 
40590.3%
 
400580.3%
 
Other values (759)274913.9%
 
ValueCountFrequency (%) 
01642582.9%
 
1580.3%
 
22< 0.1%
 
31< 0.1%
 
43< 0.1%
 
ValueCountFrequency (%) 
940001< 0.1%
 
750001< 0.1%
 
630001< 0.1%
 
580001< 0.1%
 
550002< 0.1%
 

hasTerrace
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size19.3 KiB
True
11602 
False
8210 
ValueCountFrequency (%) 
True1160258.6%
 
False821041.4%
 
2020-11-19T11:53:14.372571image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

terrace_m2
Real number (ℝ≥0)

ZEROS

Distinct177
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.50585504
Minimum0
Maximum1383
Zeros12429
Zeros (%)62.7%
Memory size154.8 KiB
2020-11-19T11:53:14.508365image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q312
95-th percentile50
Maximum1383
Range1383
Interquartile range (IQR)12

Descriptive statistics

Standard deviation27.22180594
Coefficient of variation (CV)2.591108086
Kurtosis388.4119651
Mean10.50585504
Median Absolute Deviation (MAD)0
Skewness11.83684498
Sum208142
Variance741.0267185
MonotocityNot monotonic
2020-11-19T11:53:14.728525image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
01242962.7%
 
104352.2%
 
204252.1%
 
153551.8%
 
83271.7%
 
63051.5%
 
122871.4%
 
302851.4%
 
252611.3%
 
92581.3%
 
Other values (167)444522.4%
 
ValueCountFrequency (%) 
01242962.7%
 
1290.1%
 
21130.6%
 
31690.9%
 
42101.1%
 
ValueCountFrequency (%) 
13831< 0.1%
 
7081< 0.1%
 
4951< 0.1%
 
4502< 0.1%
 
4003< 0.1%
 
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size19.3 KiB
False
19184 
True
 
628
ValueCountFrequency (%) 
False1918496.8%
 
True6283.2%
 
2020-11-19T11:53:14.905939image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size19.3 KiB
False
19330 
True
 
482
ValueCountFrequency (%) 
False1933097.6%
 
True4822.4%
 
2020-11-19T11:53:14.970545image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

type_of_sale
Categorical

MISSING

Distinct7
Distinct (%)0.1%
Missing12992
Missing (%)65.6%
Memory size154.8 KiB
isNewClassified
2363 
isNewlyBuilt
1774 
isUnderOption
1654 
isAnInteractiveSale
429 
isNewPrice
405 
Other values (2)
 
195
ValueCountFrequency (%) 
isNewClassified236311.9%
 
isNewlyBuilt17749.0%
 
isUnderOption16548.3%
 
isAnInteractiveSale4292.2%
 
isNewPrice4052.0%
 
isNotarySale1680.8%
 
isSoldOrRented270.1%
 
(Missing)1299265.6%
 
2020-11-19T11:53:15.102765image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-11-19T11:53:15.224629image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:15.407908image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length19
Median length3
Mean length6.652836665
Min length3

land
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct2166
Distinct (%)10.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean709.4122754
Minimum0
Maximum220000
Zeros10685
Zeros (%)53.9%
Memory size154.8 KiB
2020-11-19T11:53:15.620298image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q3350
95-th percentile2300
Maximum220000
Range220000
Interquartile range (IQR)350

Descriptive statistics

Standard deviation4358.677865
Coefficient of variation (CV)6.144068853
Kurtosis622.8964832
Mean709.4122754
Median Absolute Deviation (MAD)0
Skewness20.56953032
Sum14054876
Variance18998072.73
MonotocityNot monotonic
2020-11-19T11:53:15.843605image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
01068553.9%
 
150950.5%
 
100940.5%
 
120810.4%
 
200720.4%
 
110700.4%
 
70650.3%
 
300640.3%
 
160610.3%
 
1000550.3%
 
Other values (2156)847042.8%
 
ValueCountFrequency (%) 
01068553.9%
 
1210.1%
 
21< 0.1%
 
31< 0.1%
 
42< 0.1%
 
ValueCountFrequency (%) 
2200001< 0.1%
 
1500001< 0.1%
 
1200001< 0.1%
 
1178001< 0.1%
 
1100001< 0.1%
 

basement
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct2085
Distinct (%)10.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean720.5367454
Minimum0
Maximum220000
Zeros1439
Zeros (%)7.3%
Memory size154.8 KiB
2020-11-19T11:53:16.083048image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q173
median125
Q3340
95-th percentile2139
Maximum220000
Range220000
Interquartile range (IQR)267

Descriptive statistics

Standard deviation4282.219668
Coefficient of variation (CV)5.943096858
Kurtosis663.1824817
Mean720.5367454
Median Absolute Deviation (MAD)85
Skewness21.31584435
Sum14275274
Variance18337405.28
MonotocityNot monotonic
2020-11-19T11:53:16.308936image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
014397.3%
 
1003011.5%
 
902681.4%
 
702511.3%
 
802401.2%
 
1202271.1%
 
1102191.1%
 
852091.1%
 
751911.0%
 
601720.9%
 
Other values (2075)1629582.2%
 
ValueCountFrequency (%) 
014397.3%
 
1490.2%
 
2880.4%
 
3630.3%
 
4810.4%
 
ValueCountFrequency (%) 
2200001< 0.1%
 
1500001< 0.1%
 
1200001< 0.1%
 
1178001< 0.1%
 
1100001< 0.1%
 

building
Categorical

MISSING

Distinct7
Distinct (%)< 0.1%
Missing4192
Missing (%)21.2%
Memory size154.8 KiB
AS_NEW
5580 
GOOD
5344 
TO_BE_DONE_UP
1375 
TO_RENOVATE
1191 
JUST_RENOVATED
1073 
Other values (2)
1057 
ValueCountFrequency (%) 
AS_NEW558028.2%
 
GOOD534427.0%
 
TO_BE_DONE_UP13756.9%
 
TO_RENOVATE11916.0%
 
JUST_RENOVATED10735.4%
 
Not specified9534.8%
 
TO_RESTORE1040.5%
 
(Missing)419221.2%
 
2020-11-19T11:53:16.534275image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-11-19T11:53:16.670821image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:16.859491image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length14
Median length6
Mean length6.403139511
Min length3
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size19.3 KiB
False
18841 
True
 
971
ValueCountFrequency (%) 
False1884195.1%
 
True9714.9%
 
2020-11-19T11:53:17.200813image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Interactions

2020-11-19T11:52:50.744809image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:50.964590image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:51.166841image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:51.338141image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:51.535457image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:51.732564image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:51.917313image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:52.116579image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:52.300574image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:52.500148image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:52.707686image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:52.918333image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:53.118428image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:53.308331image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:53.559813image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:53.774301image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:54.002731image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:54.194302image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:54.385042image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:54.566927image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:54.767698image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:54.960052image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:55.160001image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:55.568734image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:55.876560image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:56.081506image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:56.264446image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:56.443873image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:56.645804image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:56.857607image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:57.046605image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:57.222993image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:57.405435image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:57.601602image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:57.806884image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:57.993256image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:58.179914image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:58.389396image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:58.611186image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:58.813502image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:59.049223image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:59.252018image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:59.432046image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:59.684131image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:52:59.895937image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:00.129725image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:00.374617image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:00.574465image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:00.748984image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:00.932725image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:01.127748image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:01.280425image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:01.467190image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:01.644581image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:01.820619image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:02.273664image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:02.483545image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:02.680594image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:02.883594image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:03.089729image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:03.261287image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:03.445447image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:03.644030image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:03.836259image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:04.017765image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:04.202228image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:04.378112image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:04.557344image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:04.744998image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:04.912428image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:05.096134image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:05.252343image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:05.405004image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:05.594939image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:05.785453image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:05.962207image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:06.138303image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:06.325293image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:06.506241image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:06.694115image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:06.867423image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2020-11-19T11:53:17.318902image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-11-19T11:53:17.711112image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-11-19T11:53:18.100358image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-11-19T11:53:18.502788image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2020-11-19T11:53:18.866785image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2020-11-19T11:53:07.251963image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:08.270946image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:08.689555image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-11-19T11:53:09.110325image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Sample

First rows

Unnamed: 0type_propertytype_subpropertypricelocalitynetHabitableSurfacenr_bedroomskitchen_installednr_facadeshasGardengarden_m2hasTerraceterrace_m2furnished_YNswimpool_YNtype_of_salelandbasementbuildingfireplaceExist
08901695HOUSEMIXED_USE_BUILDING29500041802423True0True1000True36FalseFalseisNewPrice14031403GOODFalse
18747010HOUSEVILLA67500087303494True0True977False0FalseFalseisNewPrice15261526AS_NEWFalse
28775843HOUSEAPARTMENT_BLOCK25000040203035True0False0False0FalseFalseisNewPrice760760TO_RENOVATEFalse
38910441HOUSEHOUSE54500012002354True0False0False0TrueFalseNaN6363JUST_RENOVATEDFalse
48758672HOUSEMIXED_USE_BUILDING50000011902202True0True60False0FalseFalseNaN193193AS_NEWFalse
58725100COMMERCIALMIXED_USE_BUILDING_COMMERCIAL22950045002300True0False0False0FalseFalseisNewPrice128128TO_BE_DONE_UPFalse
68940340HOUSEHOUSE18900040402003True0True40False0FalseFalseisNewClassified10011TO_BE_DONE_UPFalse
78923626HOUSEMIXED_USE_BUILDING46500045404004True0False0False0FalseFalseisUnderOption312312GOODFalse
88913667HOUSEAPARTMENT_BLOCK65000011502004True0True150True4FalseFalseisUnderOption301301GOODFalse
98713285OFFICEBUILDING35000070907000False0False0True0FalseFalseNaN540540TO_RESTOREFalse

Last rows

Unnamed: 0type_propertytype_subpropertypricelocalitynetHabitableSurfacenr_bedroomskitchen_installednr_facadeshasGardengarden_m2hasTerraceterrace_m2furnished_YNswimpool_YNtype_of_salelandbasementbuildingfireplaceExist
198028367880APARTMENTAPARTMENT18900040001223True0False0False0FalseFalseisNewlyBuilt0122AS_NEWFalse
198038727088APARTMENTAPARTMENT1200008430361True0False0False0FalseFalseNaN036GOODFalse
198048881075APARTMENTAPARTMENT26750091201012True0False0True22FalseFalseisNewPrice0101GOODFalse
198058903751APARTMENTPENTHOUSE975000100002True0False0True80FalseFalseisNewlyBuilt06AS_NEWFalse
198068863083APARTMENTAPARTMENT2080008300501True0False0True10FalseFalseNaN050JUST_RENOVATEDFalse
198078876673APARTMENTAPARTMENT48000010401022True0False0True13TrueFalseNaN03AS_NEWFalse
198088948452APARTMENTAPARTMENT1300006887841True0False0False0FalseFalseisNewlyBuilt084JUST_RENOVATEDFalse
198098887317APARTMENTAPARTMENT3350001200892True0False0True12FalseFalseisNewClassified089AS_NEWFalse
198108944979APARTMENTAPARTMENT980004480633True0False0False0FalseFalseisNewClassified07TO_RENOVATEFalse
198118913656APARTMENTAPARTMENT1950004000962True0False0True0FalseFalseNaN096GOODFalse

Duplicate rows

Most frequent

Unnamed: 0type_propertytype_subpropertypricelocalitynetHabitableSurfacenr_bedroomskitchen_installednr_facadeshasGardengarden_m2hasTerraceterrace_m2furnished_YNswimpool_YNtype_of_salelandbasementbuildingfireplaceExistcount
08009981HOUSEAPARTMENT_BLOCK72500020183604True0False0False0FalseFalseisUnderOption160160JUST_RENOVATEDFalse2
18016112HOUSEEXCEPTIONAL_PROPERTY82000017023674True0True9143True30FalseFalseisNewPrice91439143GOODFalse2
28035441HOUSEMIXED_USE_BUILDING45000067602595True0False0False0FalseFalseisUnderOption115115JUST_RENOVATEDFalse2
38040133HOUSEMIXED_USE_BUILDING12500040201253True0False0False0FalseFalseisUnderOption6060TO_RENOVATEFalse2
48088715HOUSEMIXED_USE_BUILDING39500067676484True0False0False0FalseFalseisUnderOption17881788JUST_RENOVATEDFalse2
58122926HOUSEMIXED_USE_BUILDING28500001200230015True0False0True40FalseFalseisUnderOption16331633TO_RENOVATEFalse2
68196608HOUSEMIXED_USE_BUILDING14900048603455True0False0True25FalseFalseisUnderOption030TO_RENOVATEFalse2
78205295HOUSEMIXED_USE_BUILDING50000041303253True0False0True120FalseFalseisUnderOption300300GOODFalse2
88349953HOUSEMIXED_USE_BUILDING49500070005406True0False0True0FalseFalseisUnderOption0540GOODFalse2
98383911HOUSEMIXED_USE_BUILDING129000700000False0False0False0FalseFalseisUnderOption00TO_RENOVATEFalse2